Expressive speech synthesis using a concatenative synthesizer
نویسندگان
چکیده
1 This paper describes an experiment in synthesizing four emotional states anger, happiness, sadness and neutral – using a concatenative speech synthesizer. To achieve this, five emotionally (i.e., semantically) unbiased target sentences were prepared. Then, separate speech inventories, comprising the target diphones for each of the above emotions, were recorded. Using the 16 different combinations of prosody and inventory during synthesis resulted in 80 synthetic test sentences. The results were evaluated by conducting listening tests with 33 naïve listeners. Synthesized anger was recognized with 86.1% accuracy, sadness with 89.1%, happiness with 44.2%, and neutral emotion with 81.8% accuracy. According to our results, anger was classified as inventory dominant and sadness and neutral as prosody dominant. Results were not sufficient to make similar conclusions regarding happiness. The highest recognition accuracies were achieved for sentences synthesized by using prosody and diphone inventory belonging to the same emotion.
منابع مشابه
Expressive Speech Recognition and Synthesis as Enabling Technologies for Affective Robot-Child Communication
This paper presents our recent and current work on expressive speech synthesis and recognition as enabling technologies for affective robot-child interaction. We show that current expression recognition systems could be used to discriminate between several archetypical emotions, but also that the old adage ”there’s no data like more data” is more than ever valid in this field. A new speech synt...
متن کاملGenerating emotional speech with a concatenative synthesizer
We describe the attempt to synthesize emotional speech with a concatenative speech synthesizer using a parameter space covering not only f0, duration and amplitude, but also voice quality parameters, spectral energy distribution, harmonics-to-noise ratio, and articulatory precision. The application of these extended parameter set offers the possibility to combine the high segmental quality of c...
متن کاملConcatenative Synthesis of Expressive Saxophone Performance
In this paper we present a systematic approach to applying expressive performance models to non-expressive score transcriptions and synthesizing the results by means of concatenative synthesis. Expressive performance models are built from score transcriptions and recorded performances by means of decision tree rule induction, and those models are used both to transform inexpressive input scores...
متن کاملUsing Concatenative Synthesis for Expressive Performance in Jazz Saxophone
We present here a concatenative sample-based saxophone synthesizer using an induced performance model intended for expressive synthesis. The system consists on three main parts. The first part provides the analysis of saxophone expressive performance recordings and the extraction of descriptors related to different temporal levels. With the obtained descriptors and the analyzed samples, we cons...
متن کاملEpoch synchronous non-overlap-add (ESNOLA) method-based concatenative speech synthesis system for Bangla
In the last decade there has been a shift towards development of speech synthesizer using concatenative synthesis technique instead of parametric synthesis. There are a number of different methodologies for concatenative synthesis like TDPSOLA, PSOLA, and MBROLA. This paper, describes a concatenative speech synthesis system based on Epoch Synchronous Non Over Lapp Add (ESNOLA) technique, for st...
متن کامل